Recent advances in deep learning have shown exciting promise in filling largeholes in natural images with semantically plausible and context aware details,impacting fundamental image manipulation tasks such as object removal. Whilethese learning-based methods are significantly more effective in capturinghigh-level features than prior techniques, they can only handle verylow-resolution inputs due to memory limitations and difficulty in training.Even for slightly larger images, the inpainted regions would appear blurry andunpleasant boundaries become visible. We propose a multi-scale neural patchsynthesis approach based on joint optimization of image content and textureconstraints, which not only preserves contextual structures but also produceshigh-frequency details by matching and adapting patches with the most similarmid-layer feature correlations of a deep classification network. We evaluateour method on the ImageNet and Paris Streetview datasets and achievedstate-of-the-art inpainting accuracy. We show our approach produces sharper andmore coherent results than prior methods, especially for high-resolutionimages.
展开▼